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ABSTRACT 

Motivation: The need for new drugs and new targets is particularly 
compelling in an era that is witnessing an alarming increase of drug 
resistance in human pathogens. The identification of new targets of 
known drugs is a promising approach, which has proven successful in 
several cases. Here, we describe a database that includes information 
on 5153 putative drug-target pairs for 150 human pathogens derived 
from available drug-target crystallographic complexes. 
Availability and implementation: The TiPs database is freely 
available at http://biocomputing.it/tips. 

Contact: anna.tramontano@uniroma1.it or allegra.via@uniroma1.it 
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1 INTRODUCTION 

Novel mechanisms to escape therapy are constantly emerging 
among human pathogen populations, and this clearly urges the 
development, on one hand, of new drugs for the treatment of the 
diseases and, on the other hand, of rapid and effective methods 
to help expand the landscape of available treatment options 
(Hopkins et al., 2011). In this context, computational studies 
are called on to help identify novel therapeutic targets and char- 
acterize their interactions, and indeed a number of such efforts 
are described in the literature (Aguero et al., 2008; Kinnings 
et aL, 2010; Lepore et al, 2011; Orti et al, 2009). However, 
these are mostly devoted to the analysis of single targets or 
specific tropical disease pathogens. 

The TiPs database has been developed with the aim of facilitat- 
ing the identification of new therapeutic targets in >1 50 organisms 
responsible for human infections. We performed a large-scale 
analysis to systematically identify candidate targets in the prote- 
omes of such organisms. The rationale of our approach is based 
on the intrinsic polypharmacological behaviour of compounds 
targeting homologous proteins (Paolini et al., 2006). We con- 
sidered all drug-target pairs for which the 3D structure of the 
complex is experimentally known and used the sequence of the 
target to identify its homologues in human pathogens. The evo- 
lutionary conservation of such homologues and their 3D struc- 
tures (available or predicted) were used to verify whether the 
original drug was in principle able to bind them as it does the 
original target. To this aim, stringent filters were applied to ensure 
that predicted binding sites and their interactions with the drug 
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are as accurate as possible. Pathogen proteins predicted with high 
confidence to be therapeutic targets and the putative drugs inter- 
acting with them were collected and annotated in TiPs. 

2 METHODS 

More than 400 human pathogen species were obtained from 'The 
Approved List of Biological Agents' provided by the Advisory 
Committee on Dangerous Pathogens. To unambiguously assign an iden- 
tifier (ID) to human pathogens, the names of the organisms were mapped 
onto the NCBI Taxonomy Database records (http://www.ncbi.nlm.nih. 
gov/Taxonomy/). 

Drug compounds and information on their molecular targets were 
obtained from DrugBank (http://www.drugbank.ca). The SMILE IDs of 
drugs annotated either as 'inhibitor', 'agonist' or 'antagonist' were used to 
associate them with ligands present in the PDB structure entries (Berman 
et al., 2012). Only identical compounds were considered (Tanimoto coef- 
ficient = 1). A total of 308 distinct drugs were observed in complex with at 
least one PDB structure. About 40% of these (119/308) occur in complex 
with their actual pharmaceutical target. These were used as starting points 
to predict potential drug targets in pathogens. The search for homologues 
in pathogens was performed using BLAST+ (Camacho et al., 2009) with 
default parameters against the nr database (ftp://ftp.ncbi.nlm.nih.gov/ 
blast/db/). We only retained highly reliable hits, i.e. those showing at 
least 40% sequence identity to the original target and e-value< 10~^. 
Pathogen taxonomic IDs were retrieved by matching the gi numbers of 
BLAST hits to the NCBI Taxonomy database. 

For each known drug-target complex, we defined the binding site as 
the subset of target residues having at least one atom within 3.5 A dis- 
tance from any atom of the drug. The drug-binding site residues in the 
predicted pathogen sequences were retrieved through a multiple sequence 
alignment (MSA) of the original target sequence with its homologues 
generated with T-coffee (Taly et al., 2011). The number and type of 
aligned residues were used to classify the binding site local conservation, 
both in terms of sequence coverage (percentage of binding site residues in 
the original target that could be aligned to the pathogen sequence) and 
identity (percentage of identical residues among the aligned binding site 
residues). Coverage and identity percentages were calculated separately 
for each pathogen sequence in the alignment. Only pathogen proteins 
showing at least 80% coverage in their binding sites were further con- 
sidered (4215). Among these 4215 reliable putative targets, only 41 have a 
solved structure in the PDB. Homology modelling (Kopp and Schwede, 
2004) was used to predict the structure of the remaining ones as follows: 
for each pathogen sequence, an MSA was generated using three iterations 
of HHblits (Remmert et al., 2012) (with default parameters) on the non- 
redundant Uniprot database. The MSA was used as HHsearch query to 
search for templates in the PDB70 database. We only selected templates 
with at least 40% sequence identity (and e-value< 10~^) with the patho- 
gen query sequence. If more than one template was found, the one with 
the highest coverage to the pathogen sequence was selected. Models were 
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Fig. 1. The figure shows the results of 'all pathogens' filtered by the 
'ATP binding' GO term query in the TiPs database. The output table 
lists all putative pathogen targets. Each table row reports the known 
and predicted target UniProt IDs, their overall sequence identity, their 
binding site identity and rmsd, whether there are clashes between the 
known drug and the predicted target, and whether there are insertions 
or deletions nearby the binding site in the alignment used to model the 
protein. For each hit, the system also shows details of the structure(s) and 
the binding site(s) in a Jmol window and the corresponding Ligplot 
drawings 

generated using the Modeller software. Note that the best template used 
to build the model corresponds to the original structure in the drug-target 
complex only in 153 cases, whereas in all the other cases, the best template 
was a different structure. 

The binding site residues of the original complex and of the predicted 
target were structurally superimposed using the LGA software (Zemla, 
2003). Subsequently, the ligands were transferred into the structure or 
model of the pathogen proteins that could be successfully superimposed 
<5A distance to the known target. Binding sites in the modelled struc- 
tures were analysed for the occurrence of nearby insertions/deletions. 
These cases are suitably highlighted in the TiPs database search output. 
This allows users to analyse them to establish the likelihood that their 
presence affects the conformation of the binding site. 



3 RESULTS 

TiPs currently contains 4071 candidate pathogen target struc- 
tures involved in 5153 different drug-target complexes in 
150 pathogens. All entries are thoroughly annotated with 



both sequence and functional information. The database can 
be queried by organism name (genus or specie name), protein 
family or function (EC number, GO terms and Pfam), as well as 
UniProt ID. The query returns a sortable table providing infor- 
mation about both known and predicted drug-target pairs and 
links to visualize specific information on the drug(s) (physico- 
chemical properties, structure, indication and side effects), 
the target(s) [UniProt annotation and PDB structure(s)] and to 
visually analyse or download their 3D complexes. Ligplot 
(Laskowski and Swindells, 2011) drawings of both the known 
and inferred binding sites in complex with the drug are available 
as well (Fig. 1). 
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